ENH: Add use_nullable_dtypes and nullable_backend global option to read_orc #49827

mroeschke · 2022-11-22T01:56:36Z

xref #ENH: Add global option io.nullable_type="pandas"|"pyarrow" to control IO reader use_nullable_dtype #48957 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Additionally

Show the dtypes in the whatsnew for clarity
Note in the docs that read_csv also supports the global nullable_backend option

lithomas1 · 2022-11-22T02:32:26Z

doc/source/whatsnew/v2.0.0.rst

@@ -33,7 +33,7 @@ sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (
 Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet` and :func:`read_csv` (with ``engine="pyarrow"``)
+A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet`, :func:`read_orc` and :func:`read_csv` (with ``engine="pyarrow"``)


Off-topic, but it seems read_excel supports use_nullable_dtypes but not io.nullable_backend. We should fix this.

Good point. I'll add this in a follow up PR.

phofl · 2022-11-22T18:19:02Z

pandas/io/orc.py

+
+        .. note
+
+            Currently only ``io.nullable_backend`` set to ``"pyarrow"`` is supported.


Do you intend to implement the flag for pandas as well?

I would want to do this in a follow up PR (unless you're interested :) )

No is fine, just wanted to understand if this is intended at all.

i want to tackle json and sql next

phofl · 2022-11-22T18:28:49Z

pandas/tests/io/test_orc.py

+            "float": np.arange(4.0, 7.0, dtype="float64"),
+            "float_with_nan": [2.0, np.nan, 3.0],
+            "bool": [True, False, None],
+            "datetime": pd.date_range("20130101", periods=3),


Can you add bool without na?

Sure, added.

phofl · 2022-11-22T18:29:54Z

pandas/tests/io/test_orc.py

+            ],
+        }
+    )
+    bytes_data = df.to_orc()


Just to avoid something subtle: can you do df.copy().to… since you are using df below?

Good idea. Added the copy

…ad_orc (pandas-dev#49827) * ENH: Add use_nullable_dtypes and nullable_backend to read_orc * Skip if not required pa version * Address review

ENH: Add use_nullable_dtypes and nullable_backend to read_orc

d8164b2

mroeschke added Enhancement IO Data IO issues that don't fit into a more specific label Arrow pyarrow functionality labels Nov 22, 2022

lithomas1 reviewed Nov 22, 2022

View reviewed changes

mroeschke added 2 commits November 22, 2022 10:09

Merge remote-tracking branch 'upstream/main' into enh/pyarrow_types/orc

7859083

Skip if not required pa version

bfdb535

phofl reviewed Nov 22, 2022

View reviewed changes

mroeschke added 2 commits November 22, 2022 13:18

Merge remote-tracking branch 'upstream/main' into enh/pyarrow_types/orc

843db9c

Address review

a81e81c

phofl approved these changes Nov 22, 2022

View reviewed changes

mroeschke added this to the 2.0 milestone Nov 23, 2022

mroeschke merged commit d8cfbd2 into pandas-dev:main Nov 23, 2022

mroeschke deleted the enh/pyarrow_types/orc branch November 23, 2022 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add use_nullable_dtypes and nullable_backend global option to read_orc #49827

ENH: Add use_nullable_dtypes and nullable_backend global option to read_orc #49827

mroeschke commented Nov 22, 2022

lithomas1 Nov 22, 2022

mroeschke Nov 22, 2022

phofl Nov 22, 2022

mroeschke Nov 22, 2022

phofl Nov 22, 2022

phofl Nov 22, 2022

mroeschke Nov 22, 2022

phofl Nov 22, 2022

mroeschke Nov 22, 2022


		.. note

		Currently only ``io.nullable_backend`` set to ``"pyarrow"`` is supported.

ENH: Add use_nullable_dtypes and nullable_backend global option to read_orc #49827

ENH: Add use_nullable_dtypes and nullable_backend global option to read_orc #49827

Conversation

mroeschke commented Nov 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment